Soft Correspondences in Multimodal Scene Parsing

نویسندگان

Sarah Taghavi Namin

Mohammad Najafi

Mathieu Salzmann

Lars Petersson

چکیده

Exploiting multiple modalities for semantic scene parsing has been shown to improve accuracy over the singlemodality scenario. However multimodal datasets often suffer from problems such as data misalignment and label inconsistencies, where the existing methods assume that corresponding regions in two modalities must have identical labels. We propose to address this issue, by formulating multimodal semantic labeling as inference in a CRF and introducing latent nodes to explicitly model inconsistencies between two modalities. These latent nodes allow us not only to leverage information from both domains to improve their labeling, but also to cut the edges between inconsistent regions. We propose to learn intradomain and inter-domain potential functions from training data to avoid hand-tuning of the model parameters. We evaluate our approach on two publicly available datasets containing 2D and 3D data. Thanks to our latent nodes and our learning strategy, our method outperforms the state-of-the-art in both cases. Moreover, in order to highlight the benefits of the geometric information and the potential of our method in simultaneous 2D/3D semantic and geometric inference, we performed simultaneous inference of semantic and geometric classes both in 2D and 3D that led to satisfactory improvements of the labeling results in both datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal Stereo Vision For Reconstruction In The Presence Of Reflection

Traditional stereo approaches assume a lambertian scene, an assumption which is violated in the presence of specular reflections. A variety of techniques have been developed to detect and reconstruct these surfaces [1, 4] using a variety of constraints, however in this work we attempt to reconstruct a reflecting surface and a reflected scene using different imaging modalities. Using a four came...

متن کامل

Deixis and Conjunction in Multimodal Systems

In order to realize their full potential, multimodal interfaces need to support not just input from multiple modes, but single commands optimally distributed across the available input modes. A multimodal language processing architecture is needed to integrate semantic content from the different modes. Johnston 1998a proposes a modular approach to multimodal language processing in which spoken ...

متن کامل

Pedestrians Tracking in a Camera Network

With the increase of the number of cameras installed across a video surveillance network, the ability of security staffs to attentively scan all the video feeds actually decreases. Therefore, the need for an intelligent system that operates as a tracking system is vital for security personnel to do their jobs well. Tracking people as they move through a camera network with non-overlapping field...

متن کامل

Scene Parsing Using Scene Attributes As Global Features

Data-driven methods have been proven very effective for the task of scene parsing. A crucial step in these methods is to retrieve a set of visually similar scenes from existing image collections for the query image according to certain global scene representations. In this work, we incorporate scene attributes into data-driven scene parsing systems as global scene features. We show that when us...

متن کامل

Hierarchical Feature For Scene Parsing Using Fully Recurrent Network

In scene parsing, the wide-range contextual information is not effectively encoded. Scene parsing provides segmentation and determines an scene into different regions associated with semantic categories. The main objective of scene parsing is to reduce semantic gap between humans and computer machines on scene understanding. The scenes parsing applications are object detection, text detection o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1709.09843 شماره

صفحات -

تاریخ انتشار 2017

Soft Correspondences in Multimodal Scene Parsing

نویسندگان

چکیده

منابع مشابه

Multimodal Stereo Vision For Reconstruction In The Presence Of Reflection

Deixis and Conjunction in Multimodal Systems

Pedestrians Tracking in a Camera Network

Scene Parsing Using Scene Attributes As Global Features

Hierarchical Feature For Scene Parsing Using Fully Recurrent Network

عنوان ژورنال:

اشتراک گذاری